-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
layer file creation: fatal_err on timeline dir fsync #6985
Merged
problame
merged 7 commits into
main
from
problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
Mar 4, 2024
Merged
layer file creation: fatal_err on timeline dir fsync #6985
problame
merged 7 commits into
main
from
problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
Mar 4, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `writer.finish()` methods already fsync the inode, using `VirtualFile::sync_all()`. All that the callers need to do is fsync their directory, i.e., the timeline directory. Note that there's a call in the new compaction code that is apparently dead-at-runtime, so, I couldn't fix up any fsyncs there [Link](https://github.com/neondatabase/neon/blob/502b69b33bbd4ad1b0647e921a9c665249a2cd62/pageserver/src/tenant/timeline/compaction.rs#L204-L211). In the grand scheme of things, layer durability probably doesn't matter anymore because the remote storage is authoritative at all times as of #5198. But, let's not break the discipline in htis commit. part of #6663
As pointed out in the comments added in this PR: the in-memory state of the filesystem already has the layer file in its final place. If the fsync fails, but pageserver continues to execute, it's quite easy for subsequent pageserver code to observe the file being there and assume it's durable, where it really isn't. It can happen that we get ENOSPC during the fsync. However, 1. the timeline dir is small (remember, the big layer _file_ has already been synced). Small data means ENOSPC due to delayed allocation races etc are less likely. 2. what elase are we going to do in that case? If we decide to bubble up the error, the file remains on disk. We could try to unlink it and fsync after the unlink. If that fails, we would _definitely_ need to error out. Is it worth the trouble though? Side note: all this logic about not carrying on after fsync failure implies that we `sync` the filesystem successfully before we restart the pageserver. Our systemd unit currently does not do that, but should.
…kio-epoll-uring/layer-write-path-fsync-cleanups
…sync-cleanups' into problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
koivunej
approved these changes
Mar 1, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking fine. Because we haven't yet added these layers to the upload queue and then uploaded a new version of index_part.json, we will delete them on the next restart (and not fsync while doing that).
2484 tests run: 2362 passed, 0 failed, 122 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
00bf050 at 2024-03-04T12:26:25.853Z :recycle: |
jcsp
approved these changes
Mar 1, 2024
…kio-epoll-uring/layer-write-path-fsync-cleanups
…sync-cleanups' into problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
Base automatically changed from
problame/integrate-tokio-epoll-uring/layer-write-path-fsync-cleanups
to
main
March 4, 2024 11:33
…layer-fatal-err-on-fsync
problame
deleted the
problame/integrate-tokio-epoll-uring/create-layer-fatal-err-on-fsync
branch
March 4, 2024 12:18
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As pointed out in the comments added in this PR:
the in-memory state of the filesystem already has the layer file in its final place.
If the fsync fails, but pageserver continues to execute, it's quite easy
for subsequent pageserver code to observe the file being there and
assume it's durable, when it really isn't.
It can happen that we get ENOSPC during the fsync.
However,
Small data means ENOSPC due to delayed allocation races etc are less likely.
If we decide to bubble up the error, the file remains on disk.
We could try to unlink it and fsync after the unlink.
If that fails, we would definitely need to error out.
Is it worth the trouble though?
Side note: all this logic about not carrying on after fsync failure
implies that we
sync
the filesystem successfully before we restartthe pageserver. We don't do that right now, but should (=> #6989)
part of #6663